Licensed under the Creative Commons attribution-noncommercial license, http://creativecommons.org/licenses/by-nc/3.0/. Please share and remix noncommercially, mentioning its origin.
In this example, we revisit the analysis of He et al. (2010). This is intended to be a concrete example of a full data analysis. Whilst He et al. (2010) examined measles in 20 cities, we will focus on London alone.
To get started, we must install pomp if it is not already installed. The following commands install pomp from source (on github).
require(devtools)
install_github("kingaa/pomp")
If we were to run into trouble in the above, we would consult the package website.
Now we’ll load the packages we’ll need, and set the random seed, to allow reproducibility.
set.seed(594709947L)
require(ggplot2)
require(plyr)
require(reshape2)
require(magrittr)
require(pomp)
stopifnot(packageVersion("pomp")>="0.70-1")
Now we’ll load the data and covariates. The data are measles reports from 20 cities in England and Wales. We also have information on the population sizes and birth-rates in these cities; we’ll treat these variables as covariates.
load("twentycities.rda")
measles %>%
subset(town=="London" & year>=1950 & year<1964) %>%
mutate(time=(julian(date,origin=as.Date("1950-01-01")))/365.25+1950) %>%
subset(time>1950 & time<1964, select=c(time,cases)) -> dat
demog %>% subset(town=="London",select=-town) -> demog
Let’s plot the data and covariates.
dat %>% ggplot(aes(x=time,y=cases))+geom_line()
demog %>% melt(id="year") %>%
ggplot(aes(x=year,y=value))+geom_point()+
facet_wrap(~variable,ncol=1,scales="free_y")
demog %>%
summarize(
time=seq(from=min(year),to=max(year),by=1/12),
pop=predict(smooth.spline(x=year,y=pop),x=time)$y,
birthrate=predict(smooth.spline(x=year+0.5,y=births),x=time-4)$y
) -> covar
plot(pop~time,data=covar,type='l')
points(pop~year,data=demog)
plot(birthrate~time,data=covar,type='l')
points(births~year,data=demog)
plot(birthrate~I(time-4),data=covar,type='l')
points(births~I(year+0.5),data=demog)
Let’s evaluate the hypothesis that these data were generated by an SEIR model. The SEIR model is a compartmental model that, diagrammatically, looks as follows.
\(b = \text{births}\)We require a simulator for this model. The following implements a simulator.
rproc <- Csnippet("
double beta, br, seas, foi, dw, births;
double rate[6], trans[6];
// cohort effect
if (fabs(t-floor(t)-251.0/365.0) < 0.5*dt)
br = cohort*birthrate/dt + (1-cohort)*birthrate;
else
br = (1.0-cohort)*birthrate;
// term-time seasonality
t = (t-floor(t))*365.25;
if ((t>=7&&t<=100) || (t>=115&&t<=199) || (t>=252&&t<=300) || (t>=308&&t<=356))
seas = 1.0+amplitude*0.2411/0.7589;
else
seas = 1.0-amplitude;
// transmission rate
beta = R0*seas*(1.0-exp(-(gamma+mu)*dt))/dt;
// force of infection
foi = beta*pow(I+iota,alpha)/pop;
// white noise (extrademographic stochasticity)
dw = rgammawn(sigmaSE,dt);
rate[0] = dw*foi/dt; //infection rate (stochastic)
rate[1] = mu; // natural S death
rate[2] = sigma; // rate of ending of latent stage
rate[3] = mu; // natural E death
rate[4] = gamma; // recovery
rate[5] = mu; // natural I death
// Poisson births
births = rpois(br*dt);
// transitions between classes
reulermultinom(2,S,&rate[0],dt,&trans[0]);
reulermultinom(2,E,&rate[2],dt,&trans[2]);
reulermultinom(2,I,&rate[4],dt,&trans[4]);
S += births - trans[0] - trans[1];
E += trans[0] - trans[2] - trans[3];
I += trans[2] - trans[4] - trans[5];
R = pop - S - E - I;
W += (dw - dt)/sigmaSE; // standardized i.i.d. white noise
C += trans[4]; // true incidence
")
In the above, \(C\) represents the true incidence, i.e., the number of new infections occurring over an interval. Since recognized measles infections are quarantined, we argue that most infection occurs before case recognition so that true incidence is a measure of the number of individuals progressing from the I to the R compartment in a given interval.
We complete the process model definition by specifying the distribution of initial unobserved states. The following codes assume that the fraction of the population in each of the four compartments is known.
initlz <- Csnippet("
double m = pop/(S_0+E_0+I_0+R_0);
S = nearbyint(m*S_0);
E = nearbyint(m*E_0);
I = nearbyint(m*I_0);
R = nearbyint(m*R_0);
W = 0;
C = 0;
")
We’ll model both under-reporting and measurement error. We want \(\mathbb{E}[\text{cases}|C] = \rho\,C\), where \(C\) is the true incidence and \(0<\rho<1\) is the reporting efficiency. We’ll also assume that \(\mathrm{Var}[\text{cases}|C] = \rho\,(1-\rho)\,C + (\psi\,\rho\,C)^2\), where \(\psi\) quantifies overdispersion. Note that when \(\psi=0\), the variance-mean relation is that of the binomial distribution. To be specific, we’ll choose \(\text{cases|C} \sim f(\cdot|\rho,\psi,C)\), where \[f(c|\rho,\psi,C) = \Phi(c+\tfrac{1}{2},\rho\,C,\rho\,(1-\rho)\,C+(\psi\,\rho\,C)^2)-\Phi(c-\tfrac{1}{2},\rho\,C,\rho\,(1-\rho)\,C+(\psi\,\rho\,C)^2),\] where \(\Phi(x,\mu,\sigma^2)\) is the c.d.f. of the normal distribution with mean \(\mu\) and variance \(\sigma^2\).
The following computes \(\mathbb{P}[\text{cases}|C]\).
dmeas <- Csnippet("
double m = rho*C;
double v = m*(1.0-rho+psi*psi*m);
double tol = 1.0e-18;
if (cases > 0.0) {
lik = pnorm(cases+0.5,m,sqrt(v)+tol,1,0)-pnorm(cases-0.5,m,sqrt(v)+tol,1,0)+tol;
} else {
lik = pnorm(cases+0.5,m,sqrt(v)+tol,1,0)+tol;
}
")
The following codes simulate \(\text{cases} | C\).
rmeas <- Csnippet("
double m = rho*C;
double v = m*(1.0-rho+psi*psi*m);
double tol = 1.0e-18;
cases = rnorm(m,sqrt(v)+tol);
if (cases > 0.0) {
cases = nearbyint(cases);
} else {
cases = 0.0;
}
")
pomp objectWe put all the model components together with the data in a call to pomp:
dat %>%
pomp(t0=with(dat,2*time[1]-time[2]),
time="time",
rprocess=euler.sim(rproc,delta.t=1/365.25),
initializer=initlz,
dmeasure=dmeas,
rmeasure=rmeas,
covar=covar,
tcovar="time",
zeronames=c("C","W"),
statenames=c("S","E","I","R","C","W"),
paramnames=c("R0","mu","sigma","gamma","alpha","iota",
"rho","sigmaSE","psi","cohort","amplitude",
"S_0","E_0","I_0","R_0")
) -> m1
The following codes plot the data and covariates together.
m1 %>% as.data.frame() %>%
melt(id="time") %>%
ggplot(aes(x=time,y=value))+
geom_line()+
facet_grid(variable~.,scales="free_y")
He et al. (2010) estimated the parameters of this model. The following attaches the
readRDS("He2010_mles.rds") %>%
subset(town=="London") -> mle
paramnames <- c("R0","mu","sigma","gamma","alpha","iota",
"rho","sigmaSE","psi","cohort","amplitude",
"S_0","E_0","I_0","R_0")
theta <- unlist(mle[paramnames])
kable(t(mle))
| 23 | |
|---|---|
| town | London |
| loglik | -3804.935 |
| loglik.sd | 0.1570628 |
| mu | 0.02 |
| delay | 4 |
| sigma | 28.88585 |
| gamma | 30.40555 |
| rho | 0.487543 |
| R0 | 56.78228 |
| amplitude | 0.5540616 |
| alpha | 0.9755556 |
| iota | 2.895151 |
| cohort | 0.5570074 |
| psi | 0.1158933 |
| S_0 | 0.02970213 |
| E_0 | 5.17412e-05 |
| I_0 | 5.1426e-05 |
| R_0 | 0.9701947 |
| sigmaSE | 0.0877794 |
Verify that we get the same likelihood as He et al. (2010).
require(foreach)
require(doMC)
registerDoMC()
foreach(i=1:4,
.packages="pomp",
.options.multicore=mcopts) %dopar% {
pfilter(m1,Np=10000,params=theta)
} -> pfs
logmeanexp(sapply(pfs,logLik),se=TRUE)
## se
## -3801.8096649 0.9033229
Simulations at the MLE.
m1 %>%
simulate(params=theta,nsim=10,as.data.frame=TRUE,include.data=TRUE) %>%
ggplot(aes(x=time,y=cases,group=sim,color=(sim=="data")))+
guides(color=FALSE)+
geom_line()+facet_wrap(~sim,ncol=2)
m1 %>%
simulate(params=theta,nsim=100,as.data.frame=TRUE,include.data=TRUE) %>%
subset(select=c(time,sim,cases)) %>%
mutate(data=sim=="data") %>%
ddply(~time+data,summarize,
p=c(0.05,0.5,0.95),q=quantile(cases,prob=p,names=FALSE)) %>%
mutate(p=mapvalues(p,from=c(0.05,0.5,0.95),to=c("lo","med","hi")),
data=mapvalues(data,from=c(TRUE,FALSE),to=c("data","simulation"))) %>%
dcast(time+data~p,value.var='q') %>%
ggplot(aes(x=time,y=med,color=data,fill=data,ymin=lo,ymax=hi))+
geom_ribbon(alpha=0.2)
The parameters are constrained to be positive, and some of them are constrained to lie between \(0\) and \(1\). We can turn the likelihood maximization problem into an unconstrained maximization problem by transforming the parameters. The following Csnippets implement such a transformation and its inverse.
toEst <- Csnippet("
Tmu = log(mu);
Tsigma = log(sigma);
Tgamma = log(gamma);
Talpha = log(alpha);
Tiota = log(iota);
Trho = logit(rho);
Tcohort = logit(cohort);
Tamplitude = logit(amplitude);
TsigmaSE = log(sigmaSE);
Tpsi = log(psi);
TR0 = log(R0);
to_log_barycentric (&TS_0, &S_0, 4);
")
fromEst <- Csnippet("
Tmu = exp(mu);
Tsigma = exp(sigma);
Tgamma = exp(gamma);
Talpha = exp(alpha);
Tiota = exp(iota);
Trho = expit(rho);
Tcohort = expit(cohort);
Tamplitude = expit(amplitude);
TsigmaSE = exp(sigmaSE);
Tpsi = exp(psi);
TR0 = exp(R0);
from_log_barycentric (&TS_0, &S_0, 4);
")
m1 <- pomp(m1,toEstimationScale=toEst,
fromEstimationScale=fromEst,
statenames=c("S","E","I","R","C","W"),
paramnames=c("R0","mu","sigma","gamma","alpha","iota",
"rho","sigmaSE","psi","cohort","amplitude",
"S_0","E_0","I_0","R_0"))
He, D., E. L. Ionides, and A. A. King. 2010. Plug-and-play inference for disease dynamics: Measles in large and small populations as a case study. Journal of The Royal Society Interface 7:271–283.